Understanding the Costs of Many-Task Computing Workloads on Intel Xeon Phi Coprocessors

نویسندگان

  • Jeffrey Johnson
  • Scott J. Krieder
  • Benjamin Grimmer
  • Justin M. Wozniak
  • Michael Wilde
  • Ioan Raicu
چکیده

Many-Task Computing (MTC) aims to bridge the gap between HPC and HTC. MTC emphasizes running many computational tasks over a short period of time, where tasks can be either dependent or independent of one another. MTC has been well supported on Clouds, Grids, and Supercomputers on traditional computing architectures, but the abundance of hybrid large-scale systems using accelerators has motivated us to explore the support of MTC on the new Intel Xeon Phi accelerators. The Xeon Phi is a PCI-Express based expansion card comprised of 60 cores supporting 240 hardware threads to produce up to 1 teraflop of doubleprecision performance in a single accelerator. These cards are already being integrated into super-computing clusters such as Stampede, which hosts over 6,400 Xeon Phi Accelerators totaling in over 7 petaflops of doubleprecision performance. This work provides an in depth understanding of MTC on the Intel Xeon Phi and presents our preliminary results of running several different workloads on pre-production Intel Xeon Phi hardware. By utilizing Intel’s provided SCIF protocol for communicating across the PCI-Express bus we have achieved over 90% efficiency near or outperforming OpenMP offloading tasks over 300 uS with our batch framework. This performance opens the opportunity for the development of a framework for executing heterogeneous tasks on the Xeon Phi alongside other potential accelerators including graphics cards for MTC applications. Our framework will provide fine granularity for executing MTC applications across large scale compute clusters. It will be integrated with our existing graphics card framework, GeMTC, to provide transparent access to GPUs, Xeon Phis, and future generations of accelerators to help bridge the gap into Exascale computing Keywords-MIMD, MTC, Accelerator, Intel Xeon Phi, Coprocessor

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unied Development for Mixed Multi-GPU and Multi-Coprocessor Environments using a Lightweight Runtime Environment

Many of the heterogeneous resources available to modern computers are designed for di‚erent workloads. In order to eciently use GPU resources, the workload must have a greater degree of parallelism than a workload designed for multicoreCPUs. And conceptually, the Intel Xeon Phi coprocessors are capable of handling workloads somewhere in between the two. Œis multitude of applicable workloads wi...

متن کامل

First experiences with the Intel MIC architecture at LRZ

With the rapidly growing demand for computing power new accelerator based architectures have entered the world of high performance computing since around 5 years. In particular GPGPUs have recently become very popular, however programming GPGPUs using programming languages like CUDA or OpenCL is cumbersome and errorprone. Trying to overcome these difficulties, Intel developed their own Many Int...

متن کامل

Exploring SIMD for Molecular Dynamics, Using Intel

We analyse gather-scatter performance bottlenecks in molecular dynamics codes and the challenges that they pose for obtaining benefits from SIMD execution. This analysis informs a number of novel code-level and algorithmic improvements to Sandia’s miniMD benchmark, which we demonstrate using three SIMD widths (128-, 256and 512bit). The applicability of these optimisations to wider SIMD is discu...

متن کامل

Matrix factorization routines on heterogeneous architectures

In this work we consider a method for parallelizing matrix factorization algorithms on systems with Intel © Xeon Phi TM coprocessors. We provide performance results of matrix factorization routines implementing this approach and available in Intel © Math Kernel Library (Intel MKL) on the Intel © Xeon © processor line with Intel Xeon Phi coprocessors. Summary New heterogeneous systems consisting...

متن کامل

Task-Based Cholesky Decomposition on Knights Corner Using OpenMP

The growing popularity of the Intel Xeon Phi coprocessors and the continued development of this new many-core architecture have created the need for an open-source, scalable, and cross-platform taskbased dense linear algebra package that can efficiently use this type of hardware. In this paper, we examined the design modifications necessary when porting PLASMA, a task-based dense linear algebra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013